Text Analysis of USA TODAY Climate Change Articles 2010-2019
1 The Data
In this analysis, we will be looking at USA Today articles published during 2010-2019. Using Nexis Uni, articles were pulled with the following search criteria in the headline or leading paragraph of the article:
- climate change
- global warming
- climate crisis
After pulling this, we found a total of 517 articles on climate change across the ten-year period, as follows.
article_counts <- read_csv("C:/Users/johnt/Desktop/Residential Education/RD/Amy Research/article_counts.csv",
col_types = cols(Year = col_character()))
article_counts %>%
tibble() %>%
gt()| Year | Articles |
|---|---|
| 2010 | 36 |
| 2011 | 38 |
| 2012 | 29 |
| 2013 | 66 |
| 2014 | 69 |
| 2015 | 99 |
| 2016 | 43 |
| 2017 | 59 |
| 2018 | 28 |
| 2019 | 48 |
1.1 Word Cloud
Aggregating all words in these articles together, we find the following as the top fifteen most frequent words used.
# get a list of the files in the input directory
files <- list.files(path)
text.function <- function(x){
tmp <- read_file(paste0(path,x))
tmp <- tibble(text = tmp)
return(tmp)
}
out1 <- lapply(files, text.function)
##Now we get the full dataframes.
out1_all <- out1
text_all <- lapply(out1_all, setDT) %>%
rbindlist(. , id="id_var") %>%
remove_rownames %>%
column_to_rownames(var="id_var")
text_all <- matrix(unlist(text_all),nr=10)
# tokenize all text
tokens_all <- tibble(text = text_all) %>%
unnest_tokens(word, text)
word_cloud <- tokens_all %>%
group_by(word) %>%
summarize(count = n()) %>%
anti_join(stop_words) %>%
filter(!(word == "nthe")) %>%
arrange(desc(count))
#Remove Numbers
word_cloud<-word_cloud[-grep("\\b\\d+\\b", word_cloud$word),]
tibble(head(word_cloud, 15)) %>%
gt()| word | count |
|---|---|
| climate | 2474 |
| change | 1655 |
| global | 944 |
| warming | 815 |
| energy | 755 |
| carbon | 635 |
| u.s | 585 |
| president | 570 |
| emissions | 564 |
| people | 556 |
| obama | 522 |
| gas | 484 |
| world | 475 |
| report | 462 |
| power | 435 |
We can visualize this as a word cloud!
#Word cloud
word_col <- word_cloud$word
word_clould <- word_cloud %>%
remove_rownames %>%
column_to_rownames(var="word") %>%
cbind(word_col)
set.seed(6969) # for reproducibility
wordcloud2(data=word_cloud, size = 1, color='random-dark', shape = "pentagon")If you have trouble following that one, here’s one more your style #yeehaww!!!
2 Sentiment Analysis
To best analyze the tone used in the USA Today articles, we break the data up by year and look at the various sentiments used in the texts; we do so with the following lexicons:
- 2015 Lexicoder Sentiment Dictionary
- NRC Lexicon
2.1 2015 Lexicoder Sentiment Dictionary
In analyzing political texts, the 2015 Lexicoder Sentiment Dictionary was used to analyze sentiment glob-style, returning positive and negative sentiment in context of the sentence. For the USA Today articles in total we see the following
x <- dfm(text_all, dictionary = data_dictionary_LSD2015)
lexi <- convert(x, to = "data.frame")
year_col <- data.frame(year = c(2010,2011,2012,2013,2014,2015,2016,2017,2018,2019))
lexi2 <- cbind(lexi,year_col) %>%
mutate(neg_tot = negative + neg_positive,
pos_tot = positive + neg_negative) %>%
select(-c(doc_id,negative, neg_positive, positive , neg_negative)) %>%
mutate(sentiment = pos_tot - neg_tot)
Positive <- sum(lexi2$pos_tot)
Negative <- sum(lexi2$neg_tot)
Sentiment <- Positive - Negative
x<- rbind(Positive,Negative,Sentiment)
colnames(x) <- "Score"
kable(x)%>%
kable_styling(position = "center", full_width = FALSE)| Score | |
|---|---|
| Positive | 11142 |
| Negative | 11234 |
| Sentiment | -92 |
With an overall sentiment score of -92 across all articles, USA Today stays almost perfectly equal in positive and negative verbage. This will be interesting to compare across networks in further research.
Subtracting the negative score from the positive score gives the complete sentiment value. Calculating the sentiment value for each year we find the following:
ggplot(data = lexi2, aes(x= as.factor(year), y = sentiment, fill = as.factor(year))) +
geom_col(show.legend = FALSE)+
ggtitle("USA Today Sentiments by Year")+
labs(subtitle="Lexicoder Dictionary", x = "Year", y = "Sentiment")+
theme_fivethirtyeight(base_size = 16, base_family = "sans" )2.2 NRC Lexicon
According to Saif Mohammad and Peter Turney, the NRC Emotion Lexicon associate each word in the English language with eight basic emotions plus positive/negstive sentiment.
- anger
- fear
- anticipation
- trust
- surprise
- sadness
- joy
- disgust
- negative sentiment
- positive sentiment
In analyzing all the texts together we see the following distribution of these eight attributes.
# get the sentiment from all text:
all_text_nrc <- tokens_all %>%
inner_join(get_sentiments("nrc")) %>% # pull out only sentiment words
count(sentiment) %>% # count each
spread(sentiment, n, fill = 0)# made data wide rather than narrow
#Transpose for Plot
all_nrc_tidy <- all_text_nrc %>%
pivot_longer(everything(), names_to = 'sentiment', values_to = 'count') %>%
arrange(desc(count))
##NRC PLOT
ggplot(data = all_nrc_tidy, aes(reorder(sentiment, -count, sum),y = count, fill = sentiment))+
geom_col(show.legend = FALSE) +
ggtitle("USA Today Sentiments 2010-2019")+
labs(subtitle="NRC Dictionary") +
theme_fivethirtyeight(base_size = 16, base_family = "sans" )+
theme(axis.text.x = element_text(angle = 45, hjust = 1))Further breaking it down by year, we see trends in the emotions over time as follows:
nrc.function <- function(x){
tmp <- x %>%
unnest_tokens(word, text) %>%
inner_join(get_sentiments("nrc")) %>% # pull out only sentiment words
count(sentiment) %>% # count each
spread(sentiment, n, fill = 0)
return(tmp)
}
out2 <- lapply(out1, nrc.function)
out2 <- lapply(out1, nrc.function)
nrc_all <- lapply(out2, setDT) %>%
rbindlist(. , id="id_var") %>%
cbind(year = c(2010,2011,2012,2013,2014,2015,2016,2017,2018,2019)) %>%
select(-id_var)
year_usa <- nrc_all %>%
mutate(year = str_remove(year, '^nrc')) %>%
pivot_longer(cols = -year, names_to = 'sentiment', values_to = 'count') %>%
arrange(desc(count))
##Stacked Bar Plot NRC
ggplot(year_usa, aes(x = year, y = count, fill = sentiment)) +
geom_bar(colour = 'black', stat = 'identity') +
ggtitle("USA Today Sentiments by Year")+
labs(subtitle="NRC Dictionary") +
theme_fivethirtyeight(base_size = 16, base_family = "sans" )Here is the same data with a line graph.
##Line Plot NRC
ggplot(year_usa, aes(x = year, y = count, group = sentiment, color = sentiment)) +
geom_line() +
ggtitle("USA Today Sentiments by Year")+
labs(subtitle="NRC Dictionary") +
theme_fivethirtyeight(base_size = 16, base_family = "sans" )NOTE: The data is not scaled in this analysis.
3 Bibliography
Mohammad, Saif M. (2016). The Sentiment and Emotion Lexicons. National Research Council of Canada.
Young, L. & Soroka, S. (2012). Affective News: The Automated Coding of Sentiment in Political Texts. Political Communication, 29(2), 205–231.